Entity Search with NECESSITY
نویسندگان
چکیده
Loosely structured heterogeneous information spaces are typically created by merging data from a variety of different applications and information sources. A common problem these information spaces need to address is that various data describe the same real-word entities (e.g., people, conferences, organizations). In this demo, we introduce necessity, an efficient and scalable entity store. necessity is able to handle a large number of entities and at the same time provide an efficient and highly accurate entity search functionality for heterogeneous and partially structured queries that follow the vision of dataspaces. 1. MOTIVATION AND OUTLINE We are currently witnessing a rapid increase in the number of loosely structured heterogeneous information spaces collections of data coming from a variety of different applications and information sources. One common problem these information spaces face, is managing their entities (e.g., organizations, events), since there will be given various representations for the same real world entities. The necessity entity store is able to address this challenge. Our system can handle a large number of entities and at the same time provides an efficient and highly accurate entity search functionality. necessity stores entity profiles composed by a set of attributevalue pairs. It allows efficient entity search over these loosely structured heterogeneous information spaces, with queries being conditions on the entity’s attributes or values. Our approach contributes to the idea of realizing dataspaces as envisioned in [3]. This includes developing data models and designing search methods for a large collection of interrelated data, even if the entity queries are specific to our context. necessity was implemented and evaluated in the conThe name of the system was inspired by philosopher William of Ockham’s famous principle: “Entities must not be multiplied beyond necessity”. Copyright is held by the author/owner. Twelfth International Workshop on the Web and Databases (WebDB 2009), June 28, 2009, Providence, Rhode Island, USA. text of the Entity Name System (ENS), for the OKKAM project. The aim of ENS is to foster the global re-use of entity identifiers and to mediate between existing identifiers for individual entities (details available in [1]). ENS receives queries and checks whether the entity described in each query exists in OKKAM. If the entity exists OKKAM returns the corresponding identifier. The core benefit of ENS is to ease integration of external applications. For example, repositories from personal information management systems can now rely on the service provided by ENS for creating their URIs for entities. As such, the integrating challenge of knowing which representations in different repositories refer to the same entity, would be resolved by the use of shared IDs as issued by OKKAM. There are further application types that can profit from our proposed approach. One example is entity search in collaboratively authored information spaces, such as Wikipedia. Each Wikipedia entry is composed by various contributors who are not enforced to follow a specific format or schemata in a consistent way. Moreover, processing a human query over Wikipedia data could benefit from matching the query with the heterogeneous data of the entities. Another example of targeted applications for necessity is entity search engines. These applications are built upon information extracted from Web pages. Integrating this extracted data imposes a matching challenge of effectively identifying and merging the existing data that refer to the same real world entities. In addition, searching for a specific entity through this plethora of entities requires advanced matching functionalities. The rest of this paper is organized as follows. Section 2 presents the necessity entity store, with main focus on the entity search process. Section 3 describes the demo scenario, and Section 4 conclusions along with future work. 2. ENTITY SEARCH PROCESS Entities in necessity are modeled as a set of attributevalue pairs; a representation similar to dataspace proposal [3]. For example, a person entity will be represented by name, affiliation, email address, and whatever else is available. Entity search allows users or applications to retrieve the entities —ideally only one— already in necessity that best match an entity description provided as a query to the system. An entity query is a set of predicates, where each predicate is a keyword or an attribute value pair, e.g., Q1: name=“John Smith” EPFL, and Q2: name=Smith affiliation=EPFL. http://www.okkam.org http://www.wikipedia.org
منابع مشابه
Using Internal Auditing in E-Banks and E-Credit Financial Institutes
Internal audit is a process affected by an entity ‘s board of directors, management and other personnel, designed to provide reasonable assurance regarding the achievement of objectives relating to operations, reporting and compliance. This definition reflects certain fundamental concepts. Internal control is: • Geared to achievement of objectives in one or more categories operations, reporti...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملتشخیص اسامی اشخاص با استفاده از تزریق کلمههای نامزد اسم در میدانهای تصادفی شرطی برای زبان عربی
Named Entity Recognition and Extraction are very important tasks for discovering proper names including persons, locations, date, and time, inside electronic textual resources. Accurate named entity recognition system is an essential utility to resolve fundamental problems in question answering systems, summary extraction, information retrieval and extraction, machine translation, video interpr...
متن کاملA Semantic Parser for Neuro-Degenerative Disease Knowledge Discovery
Ever increasing size of the biomedical literature makes tapping into implicit knowledge in scientific literature a necessity for knowledge discovery. In this paper, a semantic parser for recognizing semantic roles and named entities in individual sentences of schizophrenia related scientific abstracts is described. The named entity recognizer, CRFNER, outperforms ABNER in biological named entit...
متن کاملTowards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009